335 research outputs found

    Parallel Position Weight Matrices Algorithms

    Get PDF
    International audiencePosition Weight Matrices (PWMs) are broadly used in computational biology. The basic problems, Scan and MultipleScan, aim to find all the occurrences of a given PWM or a set of PWMs in long sequences. Some other PWM tasks share a common NP-hard subproblem, ScoreDistribution. The existing algorithms rely on the enumeration on a large set of scores or words, and they are mostly not suitable for parallelization. We propose a new algorithm, BucketScoreDistribution, that is both very efficient and suitable for parallelization. We bound the error induced by this algorithm. We realized a GPU prototype for Scan, MultipleScan and BucketScoreDistribution with the CUDA libraries, and report for the different problems speedups larger than 10× on several Nvidia cards

    Detecting Episodes with Harmonic Sequences for Fugue Analysis

    Get PDF
    International audienceFugues alternate between instances of the subject and of other patterns, such as the counter-subject, and modulatory sections called episodes. The episodes play an important role in the overall design of a fugue: detecting them may help the analysis of the fugue, in complement to a subject and a counter-subject detection. We propose an algorithm to retrieve episodes in the fugues of the first book of Bach's Well-Tempered Clavier, starting from a symbolic score which is already track-separated. The algorithm does not use any information on subject or counter-subject occurrences, but tries to detect partial harmonic sequences, that is similar pitch contour in at least two voices. For this, it uses a substitution function considering "quantized partially overlapping intervals" [Lemström and Laine, 98] and a strict length matching for all notes, except for the first and the last one. On half of the tested fugues, the algorithm has correct or good results, enabling to sketch the design of the fugue

    Melodic contour representations in the analysis of children's songs

    Get PDF
    http://igitur-archive.library.uu.nl/math/2013-0604-200726/UUindex.htmlInternational audienceChildren's songs is a particular musical genre related to folk music, with its own musical characteristics. This paper sets out to explore melodic contour in children's songs from seven different countries/nations across Europe. We look for distinctive contour patterns which differentiate the songs of each country. For pattern representation we use different viewpoints related to melodic contour, two of which also relying on beat information. Preliminary results are presented, and some initial observations regarding the patterns found, the representations used, and the genre as a whole, are discussed

    Manycore high-performance computing in bioinformatics

    Get PDF
    Mining the increasing amount of genomic data requires having very efficient tools. Increasing the efficiency can be obtained with better algorithms, but one could also take advantage of the hardware itself to reduce the application runtimes. Since a few years, issues with heat dissipation prevent the processors from having higher frequencies. One of the answers to maintain Moore's Law is parallel processing. Grid environments provide tools for effective implementation of coarse grain parallelization. Recently, another kind of hardware has attracted interest: multicore processors. Graphic processing units (GPUs) are a first step towards massively multicore processors. They allow everyone to have some teraflops of cheap computing power in its personal computer. The CUDA library (released in 2007) and the new standard OpenCL (specified in 2008) make programming of such devices very convenient. OpenCL is likely to gain a wide industrial support and to become a standard of choice for parallel programming. In all cases, the best speedups are obtained when combining precise algorithmic studies with a knowledge of the computing architectures. This is especially true with the memory hierarchy: the algorithms have to find a good balance between using large (and slow) global memories and some fast (but small) local memories. In this chapter, we will show how those manycore devices enable more efficient bioinformatics applications. We will first give some insights into architectures and parallelism. Then we will describe recent implementations specifically designed for manycore architectures, including algorithms on sequence alignment and RNA structure prediction. We will conclude with some thoughts about the dissemination of those algorithms and implementations: are they today available on the bookshelf for everyone

    RNA Locally Optimal Secondary Structures

    Get PDF
    International audienceRNA locally optimal secondary structures provide a concise and exhaustive description of all possible secondary structures of a given RNA sequence, and hence a very good representation of the RNA folding space. In this paper, we present an efficient algorithm which computes all locally optimal secondary structures for any folding model that takes into account the stability of helical regions. This algorithm is implemented in a software called regliss that runs on a publicly accessible web server

    Weighted Automata Computation of Edit Distances with Consolidations and Fragmentations

    Get PDF
    International audienceWe study edit distances between strings, based on operations of character substitutions, insertions, deletions and additionally consolidations and fragmentations. The two latter operations transform a sequence of characters into one character and vice-versa. They correspond to the compression and expansion in Dynamic Time-Warping algorithms for speech recognition and are also used for the formal analysis of written music.Such edit distances are not computable in general for arbitrary rulesets. We propose weighted automaton constructions to compute an edit distance taking into account both consolidations and deletions, or both fragmentations and insertions. Assuming that the operation ruleset has a constant size, these constructions are polynomial into the lengths of the involved strings. We finally show that the optimal weight of sequences made of consolidations chained with fragmentations, in that order, is computable for arbitrary rulesets, and not computable if we reverse the order of fragmentations and consolidations

    Rhythm extraction from polyphonic symbolic music

    Get PDF
    International audienceWe focus on the rhythmic component of symbolic music similarity, proposing several ways to extract a monophonic rhythmic signature from a symbolic poly- phonic score. To go beyond the simple extraction of all time intervals between onsets (noteson extraction), we select notes according to their length (short and long extractions) or their intensities (intensity+/− extractions). Once the rhythm is extracted, we use dynamic programming to compare several sequences. We report results of analysis on the size of rhythm patterns that are specific to a unique piece, as well as experiments on similarity queries (ragtime music and Bach chorale variations). These results show that long and intensity+ extractions are often good choices for rhythm extraction. Our conclusions are that, even from polyphonic symbolic music, rhythm alone can be enough to identify a piece or to perform pertinent music similarity queries, especially when using wise rhythm extractions

    What does the Mongeau-Sankoff algorithm compute?

    Get PDF
    How similar are two melodies? Proposed in 1990, the Mongeau-Sankoff algorithm computes the best alignment between two melodies with insertion, deletion, substitution , fragmentation, and consolidation operations. This popular algorithm is sometimes misunderstood. Indeed, computing the best edit distance, which is the best chain of operations, is a more elaborated problem. Our objective is to clarify the usage of the Mongeau-Sankoff algorithm. In particular, we observe that an alignment is a restricted case of edition. This is especially the case when some edit operations overlap, e.g. when one further changes one or several notes resulting of a fragmentation or a consolidation. We propose recommendations for people wanting to use or extend this algorithm, and discuss the design of combined or extended operations, with specific costs

    Path-equivalent developments in acyclic weighted automata

    Get PDF
    International audienceWeighted ïŹnite automata (WFA) are used with FPGA accelerating hardware to scan large genomic banks. Hardwiring such automata raises surface area and clock frequency constraints, requiring eïŹƒcient Δ-transitions-removal techniques. In this paper, we present bounds on the number of new transitions for the development of acyclic WFA, which is a special case of the Δ-transitions-removal problem. We introduce a new problem, a partial removal of Δ-transitions while accepting short chains of Δ-transitions

    Indexing labeled sequences

    Get PDF
    International audienceBackground: Labels are a way to add some information on a text, such as functional annotations such as genes on a DNA sequences. V(D)J recombinations are DNA recombinations involving two or three short genes in lymphocytes. Sequencing this short region (500 bp or less) produces labeled sequences and brings insight in the lymphocyte repertoire for onco-hematology or immunology studies. Methods: We present two indexes for a text with non-overlapping labels. They store the text in a Burrows–Wheeler transform (BWT) and a compressed label sequence in a Wavelet Tree. The label sequence is taken in the order of the text (TL-index) or in the order of the BWT (TL-BW-index). Both indexes need a space related to the entropy of the labeled text. Results: These indexes allow efficient text–label queries to count and find labeled patterns. The TL-BW-index has an overhead on simple label queries but is very efficient on combined pattern–label queries. We implemented the indexes in C++ and compared them against a baseline solution on pseudo-random as well as on V(D)J labeled texts. Discussion: New indexes such as the ones we proposed improve the way we index and query labeled texts as, for instance, lymphocyte repertoire for hematological and immunological studies
    • 

    corecore